Support QDQ format for weight-only quantization #35

mengniwang95 · 2024-08-14T09:34:00Z

Type of Change

feature

Description

Support QDQ format for weight-only quantization

It requires:

onnxruntime >= 1.19.0
opset_version of model >=21
quantized bits in [4, 8]

Expected Behavior & Potential Risk

the expected behavior that triggered by this PR

How has this PR been tested?

how to reproduce the test (including hardware information)

Dependency Change?

any library dependency introduced or removed

Signed-off-by: Mengni Wang <[email protected]>

Signed-off-by: Wang, Mengni <[email protected]>

Signed-off-by: Mengni Wang <[email protected]>

examples/nlp/huggingface_model/text_generation/quantization/weight_only/README.md

examples/nlp/huggingface_model/text_generation/quantization/weight_only/main.py

onnx_neural_compressor/algorithms/weight_only/gptq.py

onnx_neural_compressor/algorithms/weight_only/rtn.py

onnx_neural_compressor/algorithms/weight_only/gptq.py

onnx_neural_compressor/quantization/algorithm_entry.py

Signed-off-by: Mengni Wang <[email protected]>

mengniwang95 · 2024-09-20T12:02:35Z

mengniwang95 added 7 commits August 6, 2024 21:38

add standard int4 op for woq

ca65281

Signed-off-by: Mengni Wang <[email protected]>

support int4 dequant

821ddc4

Signed-off-by: Mengni Wang <[email protected]>

support int4 QDQ

f092c80

Signed-off-by: Mengni Wang <[email protected]>

add ut and enhance code

5d7ee34

Signed-off-by: Mengni Wang <[email protected]>

Update README.md

7dcef73

Signed-off-by: Wang, Mengni <[email protected]>

Update README.md

13b69e3

Signed-off-by: Wang, Mengni <[email protected]>

simplify config and fix ut

04625d0

Signed-off-by: Mengni Wang <[email protected]>

mengniwang95 force-pushed the mengni/int4_qdq branch from 4f66e64 to 04625d0 Compare August 16, 2024 08:35

mengniwang95 added 3 commits August 16, 2024 01:48

fix bug

54b5388

Signed-off-by: Mengni Wang <[email protected]>

improve ut coverage

be59ac4

Signed-off-by: Mengni Wang <[email protected]>

add ut

2ec554c

Signed-off-by: Mengni Wang <[email protected]>

mengniwang95 force-pushed the mengni/int4_qdq branch from 6c5bd02 to 2ec554c Compare August 19, 2024 07:33

mengniwang95 added 2 commits August 20, 2024 17:06

enhance dump func

d59fdca

Signed-off-by: Mengni Wang <[email protected]>

fix config setting

27eae66

Signed-off-by: Mengni Wang <[email protected]>

thuang6 reviewed Sep 3, 2024

View reviewed changes

github-advanced-security bot found potential problems Sep 13, 2024

View reviewed changes

fix acc issue and refine code

b51d8ec

Signed-off-by: Mengni Wang <[email protected]>

mengniwang95 force-pushed the mengni/int4_qdq branch from 9083a19 to b51d8ec Compare September 19, 2024 06:57

mengniwang95 added 4 commits September 19, 2024 15:04

fix CI

d455061

Signed-off-by: Mengni Wang <[email protected]>

fix ut

f343357

Signed-off-by: Mengni Wang <[email protected]>

update code

745b099

Signed-off-by: Mengni Wang <[email protected]>

remove unused code and add ut

9add47e

Signed-off-by: Mengni Wang <[email protected]>

thuang6 approved these changes Sep 23, 2024

View reviewed changes

thuang6 merged commit 05bb58a into main Sep 23, 2024
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support QDQ format for weight-only quantization #35

Support QDQ format for weight-only quantization #35

mengniwang95 commented Aug 14, 2024

mengniwang95 commented Sep 20, 2024

Support QDQ format for weight-only quantization #35

Support QDQ format for weight-only quantization #35

Conversation

mengniwang95 commented Aug 14, 2024

Type of Change

Description

Expected Behavior & Potential Risk

How has this PR been tested?

Dependency Change?

mengniwang95 commented Sep 20, 2024